Learning Spoken Words from Multisensory Input
نویسندگان
چکیده
Speech recognition and speech translation are traditionally addressed by processing acoustic signals while nonlinguistic information is typically not used. In this paper, we present a new method which explores the spoken word learning from naturally co-occurring multisensory information in a dyadic(two-person) conversation. It has been noticed that the listener always has a strong tendency to look toward objects referred to by the speaker during the conversation. In light of this, we propose to use eye gaze to integrate acoustic and visual signals, and build the audio-visual lexicons of objects. With such data gathered from conversations in different languages, the spoken names of objects in different languages can be translated based on their visual semantics. We have developed a multimodal learning system and report the results of experiments using speech, video in concert with eye movement records as training data.
منابع مشابه
Grounded speech communication
Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently, machines which process text and spoken language are not grounded in human-like ways. Instead, semantic representations in machines are highly abstract and have meaning only when interpreted by humans. We are interested ...
متن کاملGrounded spoken language acquisition: experiments in word learning
| Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-speci ed, and have meaning only when interpreted by humans. We are interested in developing computational syste...
متن کاملOn the Integration of Grounding Language and Learning Objects
This paper presents a multimodal learning system that can ground spoken names of objects in their physical referents and learn to recognize those objects simultaneously from naturally co-occurring multisensory input. There are two technical problems involved: (1) the correspondence problem in symbol grounding – how to associate words (symbols) with their perceptually grounded meanings from mult...
متن کاملProduction Is Only Half the Story — First Words in Two East African Languages
Theories of early learning of nouns in children's vocabularies divide into those that emphasize input (language and non-linguistic aspects) and those that emphasize child conceptualisation. Most data though come from production alone, assuming that learning a word equals speaking it. Methodological issues can mean production and comprehension data within or across input languages are not compar...
متن کاملThe Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud
The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud N. Bagheri, M.A. E. Abbasi, Ph.D. M. GeramiPour, Ph.D. The present study was conducted to investigate the impact of language learning activities on development of spoken language in 5-6-year-old children at private preschool center...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002